416 research outputs found
Manual for mcclust.ext R package
This R package provides post-processing tools for MCMC samples of partitions to summarize the posterior in Bayesian clustering models. Functions for point estimation are provided, giving a single representative clustering of the posterior. And, to characterize uncertainty in the point estimate, credible balls can be computed
Ultra-fast Deep Mixtures of Gaussian Process Experts
Mixtures of experts have become an indispensable tool for flexible modelling
in a supervised learning context, and sparse Gaussian processes (GP) have shown
promise as a leading candidate for the experts in such models. In the present
article, we propose to design the gating network for selecting the experts from
such mixtures of sparse GPs using a deep neural network (DNN). This combination
provides a flexible, robust, and efficient model which is able to significantly
outperform competing models. We furthermore consider efficient approaches to
computing maximum a posteriori (MAP) estimators of these models by iteratively
maximizing the distribution of experts given allocations and allocations given
experts. We also show that a recently introduced method called
Cluster-Classify-Regress (CCR) is capable of providing a good approximation of
the optimal solution extremely quickly. This approximation can then be further
refined with the iterative algorithm
Leveraging variational autoencoders for multiple data imputation
Missing data persists as a major barrier to data analysis across numerous
applications. Recently, deep generative models have been used for imputation of
missing data, motivated by their ability to capture highly non-linear and
complex relationships in the data. In this work, we investigate the ability of
deep models, namely variational autoencoders (VAEs), to account for uncertainty
in missing data through multiple imputation strategies. We find that VAEs
provide poor empirical coverage of missing data, with underestimation and
overconfident imputations, particularly for more extreme missing data values.
To overcome this, we employ -VAEs, which viewed from a generalized Bayes
framework, provide robustness to model misspecification. Assigning a good value
of is critical for uncertainty calibration and we demonstrate how this
can be achieved using cross-validation. In downstream tasks, we show how
multiple imputation with -VAEs can avoid false discoveries that arise as
artefacts of imputation.Comment: 17 pages, 3 main figures, 6 supplementary figure
Pseudo-marginal Bayesian inference for Gaussian process latent variable models
A Bayesian inference framework for supervised Gaussian process latent variable models is introduced. The framework overcomes the high correlations between latent variables and hyperparameters by collapsing the statistical model through approximate integration of the latent variables. Using an unbiased pseudo estimate for the marginal likelihood, the exact hyperparameter posterior can then be explored using collapsed Gibbs sampling and, conditional on these samples, the exact latent posterior can be explored through elliptical slice sampling. The framework is tested on both simulated and real examples. When compared with the standard approach based on variational inference, this approach leads to significant improvements in the predictive accuracy and quantification of uncertainty, as well as a deeper insight into the challenges of performing inference in this class of models
Colombian Women’s Life Patterns: A Multivariate Density Regression Approach
Women in Colombia face difficulties related to the patriarchal traits of
their societies and well-known conflict afflicting the country since 1948. In
this critical context, our aim is to study the relationship between baseline
socio-demographic factors and variables associated to fertility, partnership
patterns, and work activity. To best exploit the explanatory structure, we
propose a Bayesian multivariate density regression model, which can accommodate
mixed responses with censored, constrained, and binary traits. The flexible
nature of the models allows for nonlinear regression functions and non-standard
features in the errors, such as asymmetry or multi-modality. The model has
interpretable covariate-dependent weights constructed through normalization,
allowing for combinations of categorical and continuous covariates.
Computational difficulties for inference are overcome through an adaptive
truncation algorithm combining adaptive Metropolis-Hastings and sequential
Monte Carlo to create a sequence of automatically truncated posterior mixtures.
For our study on Colombian women's life patterns, a variety of quantities are
visualised and described, and in particular, our findings highlight the
detrimental impact of family violence on women's choices and behaviors.Comment: to appear in Bayesian analysi
Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion)
Clustering is widely studied in statistics and machine learning, with
applications in a variety of fields. As opposed to classical algorithms which
return a single clustering solution, Bayesian nonparametric models provide a
posterior over the entire space of partitions, allowing one to assess
statistical properties, such as uncertainty on the number of clusters. However,
an important problem is how to summarize the posterior; the huge dimension of
partition space and difficulties in visualizing it add to this problem. In a
Bayesian analysis, the posterior of a real-valued parameter of interest is
often summarized by reporting a point estimate such as the posterior mean along
with 95% credible intervals to characterize uncertainty. In this paper, we
extend these ideas to develop appropriate point estimates and credible sets to
summarize the posterior of clustering structure based on decision and
information theoretic techniques
- …